Templeton's Features List

The features available in Templeton have been divided into categories for ease in browsing:

Mirroring
Restrictions
Log files
Network
Advanced features

Mirroring

Several options are available when mirroring with Templeton:

Copying. Templeton retrieves HTML documents, inline images, and linked files to the local computer system. All links traversed are retrieved, regardless of file format. Templeton even retrieves some clickable image maps.
Link rewriting. HTML documents that are copied have their links rewritten automatically so that they may be used by local browsers without requiring internet access. Furthermore, the links are written using relative file names. This allows for easy file relocation (just move the entire subtree) and for use without a local WWW server.
Saving. Templeton stores files in a long file format or DOS FAT 8.3 file format. For DOS based computer systems (including Microsoft Windows and OS/2 using a FAT file system) the retrieved files are stored in a truncated 8.3 format. Under operating systems that support long files names, such as OS/2 using HPFS and Unix, Templeton will store files with long, descriptive names. You may also specify using the FAT file format for exporting to DOS based machines.
File Overwriting. Templeton may be configured to overwrite existing files from a previous mirror, only retrieve modified files, or to not process files that exist from a previous run.
Simple HTML corrections. One of the most type of common errors in HTML documents is the (unintentional) omission of quotation marks. Most HTML browsers forgive this typographical error; Templeton corrects it.
Link removal. When a hyperlink is not traversed, Templeton can be configured to either remove the link or leave the untravered link.
Mapping only. Sometimes it is not desirable to create a mirror image of a web site. Templeton can be configured to map remote sites, and to not retrieve files.
Server Identification. For some tasks, it is helpful if the type of WWW server is known. Templeton generates a list of WWW server names and types.
E-mail lists. Due to popular request, Templeton can generate a list of all e-mail addresses that it finds. This is useful for automated mailing lists and contact information.

Restrictions

To prevent unwanted wandering of Templeton across the entire World Wide Web, the search may be restricted. Templeton supports the following types of restrictions.

Host restriction. Templeton may be explicitly told not to traverse other WWW servers. The restricted server may be listed as any of the following:
1. Current host. Templeton will not traverse links that leave the initial WWW server. This is the most common type of restriction.
2. Subnet. Links within a subnet may be traversed, but WWW servers outside of the subnet are not visited. This is especially useful when your school or company maintains a number of WWW servers but you do not wish to mirror the entire World Wide Web.
3. Domain Name. In many cases, a company or school may exist on multiple subnets, but maintain the same domain name. By restricting to a domain name, these servers may be mirrored or mapped without traversing the entire World Wide Web. An example of a restricted domain name is ".intel.com" which allows only machine names that are in the Intel subnet. This would allow "www.intel.com" and "gopher.intel.com" but not "www.intel.chips.com" nor the machine "intel.com". (These are just examples, not necessarily real machines names.)
Path Restriction. When restricting to a single WWW server, you may also wish to restrict to a specific subdirectory on that server. For example, if you are interested only in the faculty at the Texas A&M Computer Science Department, then you may wish to restrict to http://www.cs.tamu.edu/faculty/. HTML documents not within the faculty subdirectory would not be retrieved.
Depth Restriction. Templeton processes links in a breadth-first search pattern. In a breadth-first search, all links from a document are traversed, then all links from the traversed documents are followed. By restricting the depth of the search, you limit the number of links to be followed. You should be cautious since a breadth-first search may exponentially increase the number of links to follow at each depth.^*
Robot Exclusion. Applications that search the World Wide Web, such as Templeton are refered to as Web Robots. Many WWW servers do not allow web robots to traverse the available information. Why not? Some robots are not nice and generate so many requests in a short amount of time that the WWW server slows to a crawl or breaks down. Other web robots try to index (or mirror or map) proprietary, copyright, or temporary information. Finally (and most common) some robots become stuck traversing infinite virtual databases such as Yahoo.com, Tiger Census Maps, or Mud Games. Templeton supports robot exclusion and can be configured to avoid restricted paths on a server.
Custom Restrictions. Templeton can be configured to traverse (or not traverse) URLs based on user specified criteria. This includes URLs specifying specific directories or specific file types. Wildcard characters, representing one or more characters, are permitted.
Basic Authentication. Templeton supports basic WWW authentication. Users are prompted for a name and password when accessing protected documents. Templeton can also read the encoded password from a configuration file.

Log Files

Templeton provides a number of log files while it operates:

Remote Mapping. This log file contains a list of each web page that was accessed, the links found on each page, and other useful information such as robot exclusions and unreachable links/hosts. Each web page contains information about its reference point and how many links you would need to follow to access this page.
Local Mapping. Similar to remote mapping, the local map file tells where each copied file was placed on the local file system.
Server Identification. This optional log file maintains a list of servers visited, including the DNS name and type of WWW server that was found.
Mailto Listing. This optional log file contains a list of e-mail addresses that were found in the HTML documents and can be very useful for generating mailing lists.

Network

These features incorporate network information.

E-mail address. A good web browser/robot informs each server "who" is running the software. This is normally your e-mail address. Since the determined address may not be the "correct" e-mail address, Templeton allows you to modify this field.
HTTP Proxy Support. For people who must use a proxy server to access beyond a firewall, Templeton allows the use of a proxy server.
Spoof Support. Some web servers refuse to pass data to "unsupported" browsers. This is usually seen with non-Netscape viewers. Spoofing allows Templeton to camouflage its name and appear as a different browser.

Advanced Features

Templeton has many features that are considered "advanced."

Non-interactive Setting. Templeton can operate without user interaction. This is especially useful for automated retrieval or backups of web documents.
System commands.Templeton has the ability to execute other applications on the retrieved documents.

[Main Menu] [Option List] [Configuration] [Trademarks]

* Neal's Web Conjecture: Yahoo is reachable from within 8 links of any web page that has links to other machines.
Neal's Other Web Conjecture: You don't want to mirror or map Yahoo.